-
Notifications
You must be signed in to change notification settings - Fork 91
median: use nth_element for linear-time selection on dense inputs #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: default
Are you sure you want to change the base?
median: use nth_element for linear-time selection on dense inputs #40
Conversation
|
Just to be in sync with the issue, please refer the community page too! |
mmuetzel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution.
I only had a very cursory look at the patch and didn't run any tests yet.
For now, only a minor style request. And one comment that needs clarification.
|
I wrote a central helper function I changed the comments as per your suggestion too. Passes all TCs. |
|
Why did you remove the BISTs that you added previously. Are they no longer passing? |
scripts/statistics/median.m
Outdated
| function m = mid_two_vals (m1, m2, is_int) | ||
| if (is_int) | ||
| samesign = sign (m1) == sign (m2); | ||
| m = samesign .* (m1 + (m2 - m1) / 2) + ! samesign .* ((m1 + m2) / 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't timed that. But there would probably be a couple less arithmetic instructions for something like this:
m(samesign) = (m1(samesign) + (m2(samesign) - m1(samesign)) / 2);
m(! samesign) = ((m1(! samesign) + m2(! samesign)) / 2);
But maybe that is outweighed by the indexing operations. But maybe worth testing (given the motivation of this change is performance).
|
@mmuetzel They were passing,
but since those BISTs were effectively testing impossible inputs rather than real behaviour. I dropped them.
Let think of anyway of benchmarking the hot paths with extreme precision as from my experience at very minute precision the way of benchmarking becomes imp, (warmup steps, blocking sec processes, testing on multiple diff archs etc) i will add the BISTs too along with this commit + results (as there is no such thing as more BISTs !!) |
|
If we had those BISTs, you would have caught earlier that something was wrong with the implementation that you had intermediately. So, having exactly those kinds of tests is useful. Could you please add them again? |
|
done! |
b992a65 to
bc2087d
Compare
bc2087d to
7d220c1
Compare
|
@mmuetzel I ran a simple microbenchmark comparing the current mask-multiplication form with a masked-indexing variant. On my machine (N = 1e7), the mask-multiplication version was consistently faster (~2x). This is likely because it avoids gather/scatter and keeps contiguous memory access, whereas the masked-indexing version introduces extra memory traffic. |
|
@mmuetzel can you please review it, i also had ran a microbench to compare both approaches |

As pointed out in the community about the hidden cost of median(x) and possible fix.
Summary
This change replaces the full sort used in
median()for dense inputs with order-statistic selection vianth_element, reducing average complexity from O(n log n) to O(n). Sparse inputs retain the original sort-based code path to preserve historical behavior and output sparsity.Algorithmic change
Previous implementation:
sort(x)followed by indexing middle element(s)New implementation:
nth_element(x, k)for odd lengthnth_element(x, [k, k+1])for even lengthThis computes only the required order statistics without fully sorting.
NaN handling
NaN semantics are preserved explicitly:
"omitnan": NaNs are filtered before selection."includenan": slices containing NaNs return NaN.This matches the behavior previously provided implicitly by
sort()Sparse behavior
Sparse inputs continue to use the original sort-based code path to preserve:
Dense inputs use the new
nth_elementpath.I went a step ahead and ran a benchmark to see if it's really doing the speed up as promised, here are the results below. I am also attaching the benchmarking script.
Performance
Benchmarks on large dense inputs (Octave 9.x, Apple clang):
Speedup grows with input size, consistent with removal of the
log nfactor.Benchmarking Script can be found here
CSV results from which plots were generated can be found here
Notes